Bacon: A GPU Programming Language With Just in Time Specialization (Draft)
نویسنده
چکیده
This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly more convenient and enabling pre-optimizations based on just-in-time specialization as this code is compiled via OpenCL at runtime. Benchmarks are provided for matrix multiplication comparing a Bacon implementation to versions using OpenCL directly. Speedups are demonstrated both for naive implementations and when comparing a Bacon implementation of generalized block decomposed matrix multiplication to a hand-vectorized OpenCL kernel. This latter result demonstrates the benefit of the total loop unrolling enabled by just-in-time specialization. Additionally, a Bacon implementation of a more complex computation, stereo disparity, is considered.
منابع مشابه
Bacon: A GPU Programming System With Just in Time Specialization
This paper describes Bacon, a data-parallel programming system targeting OpenCL-compatible graphics processors. This system is built upon the existing OpenCL standard in order to make it easier for programmers to write high performance kernels for GPU accelerated applications. The OpenCL C syntax is extended into a new language, Bacon C, intended to make development significantly more convenien...
متن کاملUltra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کاملCUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications
Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware; unless the h...
متن کاملEnabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce
Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via justin-time compilation. We extend SEJITS to exploit intermachine parallelism by targeting clusters of machines via...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011